Service Class Driven Dynamic Data Source Discovery with DynaBot

نویسندگان

  • Daniel Rocco
  • James Caverlee
  • Ling Liu
  • Terence Critchlow
چکیده

Dynamic Web data sources – sometimes known collectively as the Deep Web – increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web. To address these challenges, we present DYNABOT, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DYNABOT has three unique characteristics. First, DYNABOT utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DYNABOT employs a modular, self-tuning system architecture for focused crawling of the Deep Web using service class descriptions. Third, DYNABOT incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Focused Crawling of the Deep Web Using Service Class Descriptions

Dynamic Web data sources—sometimes known collectively as the Deep Web—increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size ...

متن کامل

A heuristic method for consumable resource allocation in multi-class dynamic PERT networks

This investigation presents a heuristic method for consumable resource allocation problem in multi-class dynamic Project Evaluation and Review Technique (PERT) networks, where new projects from different classes (types) arrive to system according to independent Poisson processes with different arrival rates. Each activity of any project is operated at a devoted service station located in a n...

متن کامل

A Semantic Infrastructure for a Knowledge Driven Sensor Web

Sensor Web researchers are currently investigating middleware to aid in the dynamic discovery, integration and analysis of vast quantities of high quality, but distributed and heterogeneous earth observation data. Key challenges being investigated include dynamic data integration and analysis, service discovery and semantic interoperability. However, few efforts deal with the management of both...

متن کامل

Aspect Oriented UML to ECORE Model Transformation

With the emerging concept of model transformation, information can be extracted from one or more source models to produce the target models. The conversion of these models can be done automatically with specific transformation languages. This conversion requires mapping between both models with the help of dynamic hash tables. Hash tables store reference links between the elements of the source...

متن کامل

Ontology-Based QoS Driven GIS Grid Service Discovery

A semantic based approach for QoS driven service discovery is proposed to help clients select the currently best services matching their requests in a dynamic GGS (GIS Grid Services) Virtual Organization. Services are selected according to the functional requirements and dynamic Qualities of Services (QoS) requirements. This framework utilizes ontology to define the terminologies on GGS functio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. J. Web Service Res.

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2007